In [ ]:
import networkx as nx
from datetime import datetime
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
As mentioned earlier, networks, also known as graphs, are comprised of individual entities and their representatives. The technical term for these are nodes and edges, and when we draw them we typically use circles (nodes) and lines (edges).
In this notebook, we will work with a synthetic (i.e. simulated) social network, in which nodes are individual people, and edges represent their relationships. If two nodes have an edge between them, then those two individauls know one another.
In the networkx
implementation, graph objects store their data in dictionaries.
Nodes are part of the attribute Graph.node
, which is a dictionary where the key is the node ID and the values are a dictionary of attributes.
Edges are part of the attribute Graph.edge
, which is a nested dictionary. Data are accessed as such: G.edge[node1][node2]['attr_name']
.
Because of the dictionary implementation of the graph, any hashable object can be a node. This means strings and tuples, but not lists and sets.
With this synthetic social network, we will attempt to answer the following basic questions using the NetworkX API:
First off, let's load up the synthetic social network. This will show you through some of the basics of NetworkX.
For those who are interested, I simply created an Erdõs-Rényi graph with n=30
and p=0.1
. I used randomized functions that I wrote to generate attributes and append them to each node and edge. I then pickled the graph to disk.
In [ ]:
G = nx.read_gpickle('Synthetic Social Network.pkl') #If you are Python 2.7, read in Synthetic Social Network 27.pkl
nx.draw(G)
In [ ]:
# Who are represented in the network?
G.nodes()
Exercise: Can you write a single line of code that returns the number of individuals represented?
In [ ]:
In [ ]:
# Who is connected to who in the network?
G.edges()
In [ ]:
len(G.edges())
Since this is a social network of people, there'll be attributes for each individual, such as age, and sex. We can grab that data off from the attributes that are stored with each node.
In [ ]:
# Let's get a list of nodes with their attributes.
G.nodes(data=True)
# NetworkX will return a list of tuples in the form (node_id, attribute_dictionary)
In [ ]:
from collections import Counter
Counter([d['sex'] for n, d in G.nodes(data=True)])
Edges can also store attributes in their attribute dictionary.
In [ ]:
G.edges(data=True)
In this synthetic social network, I have stored the date as a datetime object. Datetime objects have attributes, namely .year
, .month
, .day
.
In [ ]:
We found out that there are two individuals that we left out of the network, individual no. 31 and 32. They are one male (31) and one female (32), their ages are 22 and 24 respectively, they knew each other on 2010-01-09, and together, they both known individual 7, on 2009-12-11. Use the functions G.add_node()
and G.add_edge()
to introduce this data into the network.
If you need more help, check out https://networkx.github.io/documentation/latest/tutorial/tutorial.html
In [ ]:
While we're on the matter of graph construction, let's take a look at our tutorial class. On your sheet of paper, you should have a list of names - these are people for which you knew their name prior to coming to class.
As we iterate over the class, I would like you to holler out your name, your nationality, and in a very slow fashion, the names of the people who you knew in the class.
In [ ]:
## You may choose to join me in this endeavor together.
ptG = nx.DiGraph() #ptG stands for PyCon Tutorial Graph.
# Add in nodes and edges
ptG.add_node('', nationality='') # (my own TextExpander shortcut is ;addnode)
ptG.add_edge('', '') # (my own TextExpander shortcut is ;addedge)
# We are now going to draw the network using a hive plot, grouping the nodes by the top two nationality groups, and 'others'
# for the third group.
nodes = dict()
nodes['group1'] = [] #list comprehension here
nodes['group2'] = [] #list comprehension here
nodes['group3'] = [] #list comprehension here
edges = dict()
edges['group1'] = [] #list comprehension here
nodes_cmap = dict()
nodes_cmap['group1'] = 'blue'
nodes_cmap['group2'] = 'green'
nodes_cmap['group3'] = 'black'
edges_cmap = dict()
edges_['group1'] = 'black'
from hiveplot import HivePlot
h = HivePlot(nodes, edges, nodes_cmap, edges_cmap)
# h.set_minor_angle(np.pi / 32) #optional
h.draw()
A similar pattern can be used for edges:
[n2 for n1, n2, d in G.edges(data=True)]
or
[n2 for _, n2, d in G.edges(data=True)]
If the graph you are constructing is a directed graph, with a "source" and "sink" available, then I would recommend the following pattern:
[(sc, sk) for sc, sk, d in G.edges(data=True)]
or
[d['attr'] for sc, sk, d in G.edges(data=True)]
In [ ]:
nx.draw(G)
If the network is small enough to visualize, and the node labels are small enough to fit in a circle, then you can use the with_labels=True
argument.
In [ ]:
nx.draw(G, with_labels=True)
However, note that if the number of nodes in the graph gets really large, node-link diagrams can begin to look like massive hairballs. This is undesirable for graph visualization.
Instead, we can use a matrix to represent them. The nodes are on the x- and y- axes, and a filled square represent an edge between the nodes. This is done by using the nx.to_numpy_matrix(G)
function.
We then use matplotlib
's pcolor(numpy_array)
function to plot. Because pcolor
cannot take in numpy matrices, we will cast the matrix as an array of arrays, and then get pcolor
to plot it.
In [ ]:
matrix = nx.to_numpy_matrix(G)
plt.pcolor(np.array(matrix))
plt.axes().set_aspect('equal') # set aspect ratio equal to get a square visualization
plt.xlim(min(G.nodes()), max(G.nodes())) # set x and y limits to the number of nodes present.
plt.ylim(min(G.nodes()), max(G.nodes()))
plt.title('Adjacency Matrix')
plt.show()
Let's try another visualization, the Circos plot. We can order the nodes in the Circos plot according to the node ID, but any other ordering is possible as well. Edges are drawn between two nodes.
Credit goes to Justin Zabilansky (MIT) for the implementation.
In [ ]:
from circos import CircosPlot
fig = plt.figure(figsize=(6,6))
ax = fig.add_subplot(111)
nodes = sorted(G.nodes())
edges = G.edges()
c = CircosPlot(nodes, edges, radius=10, ax=ax)
c.draw()
It's pretty obvious in this visualization that there are nodes, such as node 5 and 18, that are not connected to any other node via an edge. There are other nodes, like node number 19, which is highly connected to other nodes.
In [ ]: